Search CORE

387 research outputs found

Formula of Entropy along Unstable Foliations for $C^1$ Diffeomorphisms with Dominated Splitting

Author: Wang Lin
Wang Xinsheng
Zhu Yujun
Publication venue
Publication date: 08/11/2017
Field of study

Metric entropies along a hierarchy of unstable foliations are investigated for

C^1

diffeomorphisms with dominated splitting. The analogues of Ruelle's inequality and Pesin's formula, which relate the metric entropy and Lyapunov exponents in each hierarchy, are given

arXiv.org e-Print Archive

Crossref

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

Author: Han Song
Lin Ji
Lin Yujun
Liu Zhijian
Wang Kuan
Publication venue
Publication date: 06/04/2019
Field of study

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.Comment: CVPR 2019. The first three authors contributed equally to this work. Project page: https://hanlab.mit.edu/projects/haq

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Evaluating Performance Persistence in US Open-End Mutual Funds

Author: Lin Enzo (Chien-Yu)
Yu Maggie (Yujun)
Publication venue
Publication date: 01/12/2009
Field of study

Performance persistence in US open-end mutual funds is a contentious issue. This paper examines the performance persistence by analyzing monthly returns of mutual funds under nine investment styles over the periods of January 1993 to December 2008. We find that there is some evidence to support the persistence of mutual fund performance. Albeit this, a zero-investment best-minus-worst strategy does outperform the market with a certain level of consistency

Simon Fraser University Institutional Repository

Hardware-Centric AutoML for Mixed-Precision Quantization

Author: Han Song
Lin Ji
Lin Yujun
Liu Zhijian
Wang Kuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/08/2020
Field of study

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support mixed precision (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, energy, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals (latency and energy) to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, energy, and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.Comment: Journal preprint of arXiv:1811.08886 (IJCV, 2020). The first three authors contributed equally to this work. Project page: https://hanlab.mit.edu/projects/haq

arXiv.org e-Print Archive

DSpace@MIT